An Evaluation of Synthetic Speech Using the PESQ Measure
نویسندگان
چکیده
The paper presents experiments on the use of the perceptual objective measure – ITU-T Rec. P.862 Perceptual Evaluation of Speech Quality (PESQ), for the automatic evaluation of synthetic speech. The approach is based on the evaluation of the statistically significant correlation between the outputs of subjective and objective tests. We propose the following technique to evaluate the usage of the PESQ method for synthetic speech: Firstly, a list of the test words has to be defined for the entire language. Secondly, the tested synthesizers are used to generate synthetic speech signal for all the words in the list. Synthesizer engines of different quality were used for the generation of stimuli: LP synthesizer, RELP synthesizer and PSOLA synthesizer, both in female and male versions. We evaluated created stimuli by listening tests. Thirdly, the PESQ method with original human (reference) and synthesized (measured) recordings as inputs is used to evaluate the overall quality of the synthesized signals. Finally, a correlation of the resulting MOS and objective MOS scores is calculated for each voice. Our results indicate a strong correlation between the mentioned subjective and objective evaluation of the quality of synthetic speech. We plan to use the PESQ measure in automatic evaluation of new versions of synthetic voices, without a need of subjective tests. This approach can foster the life cycle of the development of new versions of synthetic voices tremendously. Using PESQ with “original voice” as reference represents a rapid and repeatable synthetic voice quality measurement technique that provides the developer with results in a few moments.
منابع مشابه
Diagnostic Evaluation of Synthetic Speech Using Speech Recognition
The paper presents experiments on the use of automatic speech recognition for diagnostic evaluation of synthetic speech. Our previous work on the topic showed a strong correlation between the subjective and objective evaluation (ITU-T Rec. P.862 PESQ) of the quality of synthetic speech. The main drawback of the approach was the need for original human (reference) recordings in one to one mappin...
متن کاملSpeech enhancement based on hidden Markov model using sparse code shrinkage
This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...
متن کاملPerceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs
Previous objective speech quality assessment models, such as bark spectral distortion (BSD), the perceptual speech quality measure (PSQM), and measuring normalizing blocks (MNB), have been found to be suitable for assessing only a limited range of distortions. A new model has therefore been developed for use across a wider range of network conditions, including analogue connections, codecs, pac...
متن کاملObjective Quality Assessment of Wideband Speech Coding using W-PESQ Measure and Artificial Voice
An objective quality measurement methodology for wideband-speech coding has been studied, its essential components being an objective quality measure and an input test signal. Wideband-PESQ conforming to draft Recommendation P.862 has been studied as the objective quality measure. The Wideband-PESQ has been verified from the viewpoint of the consistency between subjectively evaluated MOS and ob...
متن کاملHarmonics Enhancement for Determined Blind Sources Separation using Source’s Excitation Characteristics
We present an improved method on combining temporal and spectral processing approaches for multichannel determined blind sources separation. The separation task is performed by applying the spectral processing on a mixed speech, using sources’ excitation characteristics. The performance of the proposed method is investigated by separating two sources from a stereo recording mixture extracted fr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005